64

2023-11-22 01:29| 来源: 网络整理| 查看: 265

64_Pandas进行字符串和数字的相互转换和格式化

本文介绍如何在 pandas.DataFrame 和 pandas.Series 中进行字符串和数字之间的转换，以及如何更改字符串的格式。

下面对内容进行说明。

类型转换（强制转换）：astype() 将数字转换为字符串将字符串转换为数字覆盖列并添加为新列二进制、八进制、十六进制数字和字符串的转换将整数值转换为字符串：bin()、oct()、hex()、format()将字符串转换为整数值：使用 int()指定基数转换基数字符串的零填充和对齐零填充对齐（右对齐、居中对齐、左对齐）转换为任意格式的字符串：format() 零填充、对齐二进制、八进制、十六进制小数位数、有效数字指数表示法百分比显示关于舍入的注意事项类型转换（强制转换）：astype()

以下面的 pandas.DataFrame 为例。

df = pd.DataFrame({'i': [0, 10, 200], 'f': [0, 0.9, 0.09], 's_i': ['0', '10', '200'], 's_f': ['0', '0.9', '0.09']}) print(df) # i f s_i s_f # 0 0 0.00 0 0 # 1 10 0.90 10 0.9 # 2 200 0.09 200 0.09 print(df.dtypes) # i int64 # f float64 # s_i object # s_f object # dtype: object 将数字转换为字符串 print(df['i'].astype(str)) # 0 0 # 1 10 # 2 200 # Name: i, dtype: object print(df['f'].astype(str)) # 0 0.0 # 1 0.9 # 2 0.09 # Name: f, dtype: object

字符串中小数点后的位数是自动确定的。如果您想自己选择它，请使用下面描述的 format() 方法。

还可以一次转换整个 pandas.DataFrame。但是，所有列都必须可转换为指定的类型。

print(df.astype(str)) # i f s_i s_f # 0 0 0.0 0 0 # 1 10 0.9 10 0.9 # 2 200 0.09 200 0.09 print(df.astype(str).dtypes) # i object # f object # s_i object # s_f object # dtype: object

还可以在整数 int 和浮点 float 之间进行转换。

print(df['i'].astype(float)) # 0 0.0 # 1 10.0 # 2 200.0 # Name: i, dtype: float64 print(df['f'].astype(int)) # 0 0 # 1 0 # 2 0 # Name: f, dtype: int64

如示例中所示，从 float 转换为 int 时，小数位会向下舍入。如果您想四舍五入到最接近的整数或四舍五入到偶数，请参阅下面的文章。

63_Pandas中数字的四舍五入将字符串转换为数字

将字符串 str 转换为数字 int、float。

print(df['s_i'].astype(int)) # 0 0 # 1 10 # 2 200 # Name: s_i, dtype: int64 print(df['s_i'].astype(float)) # 0 0.0 # 1 10.0 # 2 200.0 # Name: s_i, dtype: float64 print(df['s_f'].astype(float)) # 0 0.00 # 1 0.90 # 2 0.09 # Name: s_f, dtype: float64

直接将十进制字符串转换为整数 int 会导致错误 ValueError。您可以将其转换为 float，然后将其转换为 int。

# print(df['s_f'].astype(int)) # ValueError: invalid literal for int() with base 10: '0.1' print(df['s_f'].astype(float).astype(int)) # 0 0 # 1 0 # 2 0 # Name: s_f, dtype: int64

直接转换以逗号分隔的数字串将导致 ValueError 错误。使用字符串方法str.replace()删除逗号（用空字符串’'替换），然后将其转换为int或float。

s_sep = pd.Series(['1,000,000', '1,000', '1']) print(s_sep) # 0 1,000,000 # 1 1,000 # 2 1 # dtype: object # print(s_sep.astype(int)) # ValueError: invalid literal for int() with base 10: '1,000,000' print(s_sep.str.replace(',', '').astype(int)) # 0 1000000 # 1 1000 # 2 1 # dtype: int64 print(s_sep.str.replace(',', '').astype(float)) # 0 1000000.0 # 1 1000.0 # 2 1.0 # dtype: float64 覆盖列并添加为新列

还可以使用转换后的值覆盖现有列或将它们添加为新列。以下的例子也同样如此。

df['i'] = df['i'].astype(str) print(df) # i f s_i s_f # 0 0 0.00 0 0 # 1 10 0.90 10 0.9 # 2 200 0.09 200 0.09 df['f_s'] = df['f'].astype(str) print(df) # i f s_i s_f f_s # 0 0 0.00 0 0 0.0 # 1 10 0.90 10 0.9 0.9 # 2 200 0.09 200 0.09 0.09 print(df.dtypes) # i object # f float64 # s_i object # s_f object # f_s object # dtype: object 二进制、八进制、十六进制数字和字符串的转换

将整数值和以二进制、八进制和十六进制数字表示的字符串相互转换。

将整数值转换为字符串：bin()、oct()、hex()、format()

以下面的 pandas.Series 为例。整数值在代码中可以用十六进制格式表示，例如 0xff，但在 print() 等输出中以十进制格式显示。

s_int = pd.Series([0xff, 0o77, 0b11]) print(s_int) # 0 255 # 1 63 # 2 3 # dtype: int64

要将整数值转换为二进制、八进制或十六进制字符串，请分别对每个元素应用 bin()、oct() 和 hex() 函数。

对 pandas.Series 和 pandas.DataFrame 的列元素使用 map() 方法，对 pandas.DataFrame 的所有元素使用 applymap() 方法。

06_Pandas中map(),applymap(),apply()函数的使用方法 print(s_int.map(bin)) # 0 0b11111111 # 1 0b111111 # 2 0b11 # dtype: object print(s_int.map(oct)) # 0 0o377 # 1 0o77 # 2 0o3 # dtype: object print(s_int.map(hex)) # 0 0xff # 1 0x3f # 2 0x3 # dtype: object

另一种方法是使用 format()。使用 format()，您可以选择前缀（0b、0o、0x）、零填充等。详细内容将在后面解释。

print(s_int.map('{:b}'.format)) # 0 11111111 # 1 111111 # 2 11 # dtype: object print(s_int.map('{:#b}'.format)) # 0 0b11111111 # 1 0b111111 # 2 0b11 # dtype: object print(s_int.map('{:#010b}'.format)) # 0 0b11111111 # 1 0b00111111 # 2 0b00000011 # dtype: object 将字符串转换为整数值：使用 int()

以下面的 pandas.DataFrame 为例。

df_str = pd.DataFrame({'bin': ['0b01', '0b10', '0b11'], 'oct': ['0o07', '0o70', '0o77'], 'hex': ['0x0f', '0xf0', '0xff'], 'dec': ['1', '10', '11']}) print(df_str) # bin oct hex dec # 0 0b01 0o07 0x0f 1 # 1 0b10 0o70 0xf0 10 # 2 0b11 0o77 0xff 11 print(df_str.dtypes) # bin object # oct object # hex object # dec object # dtype: object

带有基数前缀（例如 0b、0o 和 0x）的字符串无法使用 astype() 进行转换。

# print(df_str['bin'].astype(int)) # ValueError: invalid literal for int() with base 10: '0b01'

使用匿名函数（lambda 表达式）指定每个基数并将其应用于 int() 的第二个参数。

print(df_str['bin'].map(lambda x: int(x, 2))) # 0 1 # 1 2 # 2 3 # Name: bin, dtype: int64 print(df_str['oct'].map(lambda x: int(x, 8))) # 0 7 # 1 56 # 2 63 # Name: oct, dtype: int64 print(df_str['hex'].map(lambda x: int(x, 16))) # 0 15 # 1 240 # 2 255 # Name: hex, dtype: int64

如果有前缀，则将 int() 的第二个参数设置为 0，基数将根据前缀自动设置。

print(df_str.applymap(lambda x: int(x, 0))) # bin oct hex dec # 0 1 7 15 1 # 1 2 56 240 10 # 2 3 63 255 11

也可以通过将基数指定为 int() 的第二个参数来转换没有前缀的字符串。

print(df_str['dec'].map(lambda x: int(x, 2))) # 0 1 # 1 2 # 2 3 # Name: dec, dtype: int64

如果字符串有前导 0，则可以使用 astype() 进行转换，但请注意，使用第二个参数为 0 的 int() 会导致 ValueError 错误。

s_str_dec = pd.Series(['01', '10', '11']) print(s_str_dec) # 0 01 # 1 10 # 2 11 # dtype: object print(s_str_dec.astype(int)) # 0 1 # 1 10 # 2 11 # dtype: int64 # print(s_str_dec.map(lambda x: int(x, 0))) # ValueError: invalid literal for int() with base 0: '01' 指定基数转换基数

重复前面的方法即可转换基数。例如，将八进制字符串转换为十六进制字符串如下。

print(df_str['oct'].map(lambda x: int(x, 8)).map(hex)) # 0 0x7 # 1 0x38 # 2 0x3f # Name: oct, dtype: object 字符串的零填充和对齐

通过使用 str 访问器应用字符串方法，您可以用零填充字符串并对齐它们（右、中、左）。以下面的 pandas.Series 为例。

s_str = pd.Series(['0', '10', 'xxx']) print(s_str) # 0 0 # 1 10 # 2 xxx # dtype: object 零填充

使用 str.zfill() 填充零。左侧用零填充到参数中指定的字符数。

print(s_str.str.zfill(8)) # 0 00000000 # 1 00000010 # 2 00000xxx # dtype: object 对齐（右对齐、居中对齐、左对齐）

使用 str.rjust()、str.center() 和 str.ljust() 分别进行右对齐、居中对齐和左对齐。如果省略第二个参数，它将用空格填充，如果指定一个字符，它将用该字符填充。

print(s_str.str.rjust(8)) # 0 0 # 1 10 # 2 xxx # dtype: object print(s_str.str.rjust(8, '_')) # 0 _______0 # 1 ______10 # 2 _____xxx # dtype: object print(s_str.str.center(8)) # 0 0 # 1 10 # 2 xxx # dtype: object print(s_str.str.center(8, '_')) # 0 ___0____ # 1 ___10___ # 2 __xxx___ # dtype: object print(s_str.str.ljust(8)) # 0 0 # 1 10 # 2 xxx # dtype: object print(s_str.str.ljust(8, '_')) # 0 0_______ # 1 10______ # 2 xxx_____ # dtype: object

使用 str 访问器的字符串方法不能应用于数字列。使用 astype() 将其转换为字符串，然后应用它。

s_num = pd.Series([0, 10, 100]) # print(s_num.str.rjust(8, '_')) # AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas print(s_num.astype(str).str.rjust(8, '_')) # 0 _______0 # 1 ______10 # 2 _____100 # dtype: object 转换为任意格式的字符串：format()

通过对每个元素应用字符串方法format()，您可以将其转换为任何格式的字符串。以下面的 pandas.DataFrame 为例。

df = pd.DataFrame({'i': [0, 10, 100], 'f': [0.1234, 1.234, 12.34], 'round': [0.4, 0.5, 0.6]}) print(df) # i f round # 0 0 0.1234 0.4 # 1 10 1.2340 0.5 # 2 100 12.3400 0.6 print(df.dtypes) # i int64 # f float64 # round float64 # dtype: object 零填充、对齐 print(df['i'].map('{:08}'.format)) # 0 00000000 # 1 00000010 # 2 00000100 # Name: i, dtype: object print(df['i'].map('{:_

【本文地址】

64

64

今日新闻

推荐新闻